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ABSTRACT 


The  goal  of  this  thesis  was  to  analyze  the  impact  of 
increased  utilization  and  deployments  of  Troop  Program  Unit 
soldiers  since  9/11,  countered  against  the  effects  of 
demographics  and  of  the  programs  and  actions  meant  to 
control  attrition. 

This  study  conducted  a  process  of  data  collection,  data 
manipulation,  and  data-mining  algorithms  executed  against 
the  entire  enlisted  TPU  population  and  focused  toward 
attrition  behavior. 

Significant  factors  in  determining  attrition  behavior 
included  time  in  service,  increased  bonus  levels  and  the 
Delayed  Entry  Program.  Mobilizations,  in  and  of  themselves, 
appear  to  have  little  impact.  The  models  we  built  showed 
significant  potential  for  predicting  behavior.  We  believe 
that  this  process  should  be  continued  and  expanded  to  a  tool 
to  aid  in  and  affect  attrition. 
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EXECUTIVE  SUMMARY 


The  goal  of  this  thesis  was  to  analyze  the  impact  of 
increased  utilization  and  deployments  of  enlisted  Troop 
Program  Unit  (TPU)  soldiers  since  9/11,  controlling  for  the 
effects  of  demographics  and  of  the  programs  and  actions 
meant  to  control  attrition.  Maintaining  a  viable  and  manned 
reserve  force  in  this  environment  is  critical  to  the 
security  of  the  nation. 

We  conducted  this  study  through  a  process  of  data 
collection,  data  manipulation,  and  data-mining  algorithms 
executed  against  the  entire  TPU  population  and  focused 
toward  attrition  behavior. 

There  were  no  "magic  bullets"  in  the  results.  Time  in 
service  is  the  biggest  single  factor  in  determining 
attrition  behavior.  Increased  bonus  levels  and  the  Delayed 
Entry  Program  appear  to  be  significant  factors  as  well. 
Mobilizations,  in  and  of  themselves,  appear  to  have  little 
impact.  We  hypothesize  that  the  positive  attrition  trends 
seen  within  these  forces  is  due  to  retention  actions  within 
the  Army. 

The  models  we  built  showed  significant  potential  for 
predicting  behavior.  We  believe  that  this  process  should  be 
continued  and  expanded  to  produce  a  tool  to  aid  in  and 
affect  attrition.  We  envision  a  system  in  which  data  on  the 
service  member  along  with  responses  to  simple  questions 
filled  in  through  Army  knowledge  Online  (AKO)  or  Human 
Resources  Command  could  be  used  to  focus  resources  and 
assist  Retention  Specialists  in  retaining  the  right  people. 


xv 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


xvi 


I. 


INTRODUCTION 


A.  BACKGROUND 

The  United  States  Army  Reserve  (USAR)  is  a  key 
component  of  the  Department  of  the  Army.  The  Army  Reserve's 
mission  is  to  provide  trained  and  ready  personnel  with  the 
skills  necessary  to  support  and  defend  the  nation  during 
peacetime,  emergencies,  and  war.  The  Selected  Reserve  is  the 
foremost  component  to  meeting  this  mission.  The  effective 
management  of  the  enlisted  personnel  inventory  in  the 
Selected  Reserve  is  essential  for  the  proper  support  of  this 
mission.  An  important  component  of  this  management  is 
tracking  and  predicting  attrition  within  the  force.  The 
Global  War  on  Terror  (GWOT)  has  significantly  changed  the 
utilization  of  Reserve  Forces  and  has  impacted  the  attrition 
behavior  of  the  Army  Reserve.  A  large  number  of  actions 
have  been  taken  by  the  Army  to  control  this  attrition. 

B.  THESIS  OBJECTIVE 

In  this  thesis,  the  goal  is  to  determine  and  analyze 
the  impact  of  increased  utilization  and  deployments  of  TPU 
soldiers  against  the  effects  of  various  demographics  and  of 
the  programs  and  actions  meant  to  control  attrition.  The 
Select  Reserve  is  that  portion  of  the  reserve  that  is  the 
most  ready  and  deployable.  The  Select  Reserve  is  made  up  of 
three  subsets.  These  subsets  are  the  Troop  Program  Unit 
(TPU) ,  Active  Guard  and  Reserve  (AGR) ,  and  the  Individual 
Mobilization  Augmentee  (IMA) .  TPU  soldiers  are  the  largest 
portion  of  the  Selected  Reserve.  They  are  the  classic 
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reserve  soldiers.  TPU  soldiers  are  the  one  week-end-a- 
month,  two-weeks-every-summer  soldiers  that  make  up  the 
backbone  of  the  Army  Reserve.  They  are  the  citizen 
soldiers;  typically  having  civilian  employment,  they  are  the 
part  of  the  force  most  greatly  affected  by  deployments. 
They  make  up  approximately  90%  of  the  Selected  Reserve.  TPU 
soldiers  attrite  from  the  Army  Reserve  at  various  times  and 
for  various  reasons.  Some  attrition  is  unavoidable  and  even 
positive.  Therefore,  we  need  to  be  able  to  classify 
attrition  based  on  who,  when,  and  where.  For  example,  a  TPU 
soldier  who  leaves  the  force  after  24  months  of  service  for 
unsatisfactory  participation  would  be  a  "bad"  attrition, 
whereas  a  soldier  who  leaves  after  300  months  due  to 
retirement  would  be  a  "good"  attrition.  Also,  some  soldiers 
transfer  laterally  within  the  military,  becoming  active  duty 
soldiers  in  the  AGR  program  or  the  Regular  Army,  or  by 
transferring  to  various  other  places,  such  as  the  National 
Guard  or  the  Navy.  These  transfers,  while  not  bad  for  the 
Nation  as  a  whole,  still  place  a  burden  on  the  Reserve's 
accessioning  agencies.  Ultimately,  we  would  like  to  be  able 
to  identify  those  factors  that  affect  all  types  of  attrition 
and  use  this  information  to  predict  and  possibly  reduce 
future  attrition.  Outcomes  from  this  study  could  be  used  to 
determine  recruiting  and  retention  goals,  set  bonus  levels, 
and  define  future  manpower  management  programs.  One 
possible  outcome  for  future  study  is  to  carry  this  data 
mining  process  further.  Corporate  America  uses  data  mining 
to  track  behaviors  and  predict  future  behavior.  We  have 
access  to  a  much  greater  range  of  data  on  our  audience.  It 
is  fully  conceivable  that  this  work  could  be  carried  onward 
to  create  a  manpower  management  tool  to  better  maintain  our 
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force,  not  only  to  reduce  attrition,  but  to  maintain  or 
increase  levels  of  job  satisfaction. 

C .  LITERATURE  REVIEW 

A  literature  review  of  relevant  studies  uncovered  a  CNA 
study  entitled  "Determining  Patterns  of  Reserve  Attrition 
since  September  11,  2001.  (Dolf ini-Reed,  2005)"  It  looked 

at  attrition  across  all  reserve  components  and  what  their 
trends  were  since  9/11.  This  document  was  a  good  starting 
point  for  this  analysis.  It  looked  at  attrition  trends 
related  to  deployment  in  the  Global  War  on  Terror.  It  did 
not  look  at  multivariate  effects  nor  take  into  account  any 
interactions  that  may  have  also  been  affecting  attrition 
trends.  The  main  factors  they  looked  at  were  mobilizations, 
deployments,  service  and  component,  and  time  after 
deployment  ended.  This  was  a  time  series  descriptive 
analysis  of  these  factors.  It  did  show  both  positive  and 
negative  trends  based  on  these  effects.  The  authors 
suggested  conducting  multivariate  analysis  and  then  went 
into  what  would  be  needed  to  develop  a  model  for  loss 
behavior.  The  model  they  suggested  was  a  Cox  Regression 
combined  with  a  multinomial  logit  regression  to  create  a 
special  semi-Markov  process.  Ideally,  this  present  study 
could  be  utilized  in  developing  a  multinomial  factorization 
for  support  of  just  such  a  model. 

We  also  looked  at  two  other  studies  referred  to  by  the 

authors  of  this  first  study.  The  first  study,  "Retention  in 

the  Reserve  and  Guard  Components"  (Hansen,  2004)  looked  at 

all  reserve  components  from  FY00  to  FY03.  This  was  a 

multivariate  analysis,  but  the  data  set  the  authors  used  did 

not  contain  any  information  on  mobilizations  and 
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deployments.  Some  of  the  outcomes  they  found  are  supported 
by  the  study.  They  found  time  in  service,  education  levels, 
and  earning  potential  to  be  significant  factors. 

The  second  study  was  "Serving  Away  From  Home:  How 
Deployments  Influence  Reenlistment.  (Hosek,  2002)"  This 
study  conducted  an  expected  utility  model  based  on  a 
Bayesian  Updating  Process.  They  sought  to  model  how 
previous  deployments  would  affect  the  decision  to  reenlist. 
They  used  data  on  people  facing  reenlistment  in  the  FY96- 
FY99  timeframe.  This  study  was  focused  toward  active 
component  military  and  looked  specifically  at  reenlistment, 
and  therefore  is  of  limited  utility,  but  it  did  have  some 
interesting  conclusions.  The  authors  of  this  paper  found 
that  those  who  had  deployed  were  more  likely  to  reenlist 
that  those  who  did  not.  They  hypothesized  that  deployments 
helped  soldiers  revise  their  expectations  and  created  a 
bridge  between  past  deployments  and  current  reenlistment 
decisions . 


4 


II.  RESEARCH  METHODOLOGY 


A.  DATA  VALIDATION 

1 .  Data  Collection  and  Processing 

The  data  used  in  this  study  was  provided  from  numerous 
sources.  G-18,  G-17,  G-19  are  flat  files  from  the  Total 
Army  Personnel  Database-Reserve  (TAPDB-R) .  The  G-18 

contains  individual  personnel  data  on  all  members  of  the 
Select  Reserve.  The  G-17  contains  data  on  reserve  units. 
The  G-19  provides  data  on  individual  unit  assignments.  The 
A11M0B  is  a  query  file  provided  by  USARC  G-l.  It  contains 
transaction  information  on  every  individual  mobilized  since 
September  11,  2001.  The  Transaction  File  (XTX)  is  a  file  of 
TAPDB-R  transactions  that  involve  status  changes.  It  is 
from  this  transaction  file  that  we  get  our  loss  and  Delayed 
Entry  Program  (DEP)  data.  USAR_Contract  data  was  provided 
by  the  United  States  Army  Recruiting  Command  (USAREC) .  It 
provides  data  on  all  people  contracted  into  the  Army 
Reserves  since  1999.  This  includes  DEP  soldiers  who  never 
made  it  into  the  force.  DJMSRC_EXTRACT  is  finance  data 
recorded  from  the  Defense  Joint  Military  pay  Software, 
Reserve  Component  (DJMS-RC)  and  extracted  from  the  Reserve 
Component  Management  System  (RCMS) .  It  provides  data  on 
types  of  bonuses  and  years  of  payment.  The  DMDC_EXTRACT  was 
extracted  from  the  Defense  Manpower  Data  Center  (DMDC) 
Montgomery  GI  Bill  (MGIB)  database.  It  provides  data  on 
MGIB  payments  for  individuals  in  the  TPU. 
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The  G-18  files  were  the  most  important  data  for  this 
study.  They  provide  personnel  data  on  every  individual  in 
the  TPU  (Current  Organization  (CURORG)  =  H)  .  This  data  set 
includes  an  entire  range  of  demographics  on  personnel 
assigned  to  the  Selected  Reserve.  These  files  were  provided 
in  FY  chunks.  We  retained  only  TPU  soldiers  (CURRORG  =  H; 
MIL_PER_CA  (Military  Personnel  Category)  =  E)  .  We  also 
removed  those  data  elements  that  were  sparsely  populated  or 
contained  data  that  was  not  pertinent  to  this  study  (e.g. 
administrative  data,  officer  specific,  etc.).  The  FY06  file 
served  as  a  starting  point  for  creating  the  BASIS  file. 
Records  from  FY05  with  SSNs  that  were  in  the  current  BASIS 
file  were  removed;  the  remainder  were  then  appended  to 
BASIS.  This  procedure  was  continued  through  to  the  FY02  G- 
18  data.  In  this  way,  we  were  sure  to  have  the  most  current 
data  on  any  personnel  in  the  BASIS  file.  After  creating 
this  file,  we  removed  two  records  from  this  data,  because 
they  had  erroneous  SSNs. 

Six  additional  tables  were  created  from  the  remaining 
data  sources.  Contract_USAR  was  created  by  appending  two 
queries  provided  by  Recruiting  Command  and  checked  for 
duplicates.  MOB_Count  was  created  from  the  A11M0B  file. 
The  A11M0B  file  is  a  transaction  file  that  has  a  separate 
entry  for  every  individual  mobilization  of  each  TPU  soldier. 
MOB_Count  combines  each  of  these  based  on  SSN  and  computes 
the  total  number  of  days  mobilized,  counts  number  of 
mobilizations,  indicates  whether  or  not  they  were  deployed, 
and  reports  information  on  the  last  mobilization.  For  DEPQF 
and  LossQF,  we  combined  five  years  of  XTX  files.  We  located 
all  DEP  applicants  (MPAPOI  (previous  CURRORG)  =  V,  MPAORG 


(Current  CURRORG)  =  H)  and  all  losses  (MPAPOI  =  H,  MPAORG  <> 
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H) ;  DEPQF  and  LossQF  are  the  final  cleaned  queries  with  the 
duplicate  entries  removed.  Where  there  were  multiple 
entries,  the  last  was  used,  as  transactions  are  often 
amended.  The  loss  data  had  six  entries  that  were  complete 
copies;  these  were  removed.  The  DJMSRC_EXTRACT_Crosstab 
table  was  created  from  a  crosstab  query  of  the 
DJMSRC_EXTRACT  data.  We  summed  bonus  amounts  paid  by  fiscal 
year  for  each  SSN.  The  DMDC_EXTRACT  was  checked  for 
duplicates.  The  last  transaction  was  used  in  the  case  of 
multiple  entries . 

A  query  named  FinalFile  pulled  toqether  these  various 
data  sets  into  one  master  table  for  analysis  in  Clementine. 
This  query  joined  the  six  smaller  tables  to  the  BASIS  file. 
This  provided  a  startinq  database  for  inclusion  into 
Clementine.  Table  1  in  Appendix  A  lays  out  the  data 
dictionary  for  this  final  table. 


DMDC_EX7RACT 


Contraet_.. . 


Ssn 

FirstOfBonus 


SumOfDURAT  + 
LastOfAPC_D 
MaxOfHOSTIl 
LastOfUIC 
CNTDPLY  ' 


PUIC 

A 

MPAORG 

MPAPOI 

SRCSID 

USSCPR 

V 

LossQ.MPARS 

LossQ.MPARE 

MPAORG 

Title 

MPADT 


Figure  1.  FinalFile  Table  Query  View. 


2 .  Clementine  Introduction 

Clementine  is  the  SPSS  enterprise-strength  data  mining 
workbench.  It  uses  a  visual  interface  to  execute  a  three- 
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step  process  of  reading  in,  manipulating,  and  sending  data 
to  a  destination.  A  Clementine  "Stream"  is  the  interface 
that  the  software  uses  to  conduct  this  process.  A  Stream 
consists  of  a  various  set  of  nodes,  each  of  which  performs  a 
specific  set  of  varied  functions.  At  the  simplest  level, 
the  shape  of  the  node  tells  you  its  general  function.  Round 
nodes  are  source  nodes.  They  function  by  grabbing  data  from 
any  number  of  sources,  including  databases.  Excel  files, 
text  files,  or  either  SPSS  or  SAS  statistical  software. 
Hexagonal  nodes  are  known  as  either  field  or  record  nodes. 
These  nodes  perform  functions  to  prepare,  transform,  or 
otherwise  modify  the  data  in  preparation  for  introducing  it 
into  any  of  the  algorithms  at  the  heart  of  this  data-mining 
process.  These  algorithms  are  represented  by  pentagonal 
nodes.  These  nodes  execute  a  variety  of  machine  learning, 
artificial  intelligence,  and  statistical  modeling  methods. 
They  allow  you  to  derive  information  from  your  data  and 
create  predictive  models  (SPSS  Inc,  2006) .  Square  nodes  are 
output  nodes.  They  can  provide  files  of  the  transformed 
data  for  further  work,  as  well  as  analytical  output  of  the 
results.  Triangular  nodes  are  graphing  nodes  for  visual 
analysis  of  the  data.  In  order  to  create  an  acceptable 
model  we  must  first  continue  the  data  manipulation  process 
using  the  Clementine  software. 

B .  DATA  AGGREGATION 

1 .  Clementine  Acceptability 


Data  is 

never 

"clean . " 

The  previous  steps 

were 

conducted  to 

create 

a  single 

data 

set 

for 

inclusion 

into 

Clementine . 

As  we 

introduced 

the 

data 

to 

Clementine 

T  we 

8 


began  a  process  of  accessing  and  modifying  the  data  fields 
to  ensure  could  be  properly  read  by  the  software.  This  was 
an  iterative  process,  stepping  back  and  forth  between 
Clementine,  Access,  and  FoxPro  to  get  the  data  into  the 
right  format  and  type.  This  first  Clementine  model  (Fig. 
2)  was  built  to  do  just  that.  We  used  a  database  source 
node  and  fed  it  into  a  type  node  and  out  to  a  table  node. 
We  looked  at  outputs  and  ascertained  a  number  of  issues  with 
our  data.  As  an  example,  all  date  fields  were  coming  across 
as  strings.  Because  we  were  not  doing  a  time  series  study, 
exact  dates  were  not  necessary.  We  fixed  this  by  going  back 
to  Access  and  trimming  the  field  to  provide  only  the 
calendar  year.  We  used  a  filter  node  to  extract  all 
personal  data  (SSN,  Name)  from  our  model.  The  first  filler 
node  put  an  "N/A"  in  all  blank  entries  for  the  loss  fields. 
The  second  filler  was  used  to  repair  the  PPSC  field.  This 
is  a  numeric  flag  field  of  physical  profiles.  The  default 
is  111111.  We  also  determined  that  using  a  database  source 
link  significantly  slowed  the  execution  of  anything  done  in 
Clementine,  so  we  used  this  stream  to  create  a  flat  file  for 
actual  modeling. 


FinalFlat 


Figure  2.  Initial  Clementine  Model. 
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2. 


Data  Processing 


Figure  3  shows  the  next  set  of  steps.  We  filtered  out 
some  additional  fields  because  they  were  too  diversified, 
and  would  likely  provide  no  insight.  These  included  such 
things  as  street  addresses,  cities,  zip  codes,  and  grid 
locator  codes.  We  then  generated  a  new  flat  file  with  these 
fields  removed. 


Figure  3.  Initial  Data  Manipulation  in  the  Model. 


At  this  point,  we  began  manipulating  the  data  to 
prepare  for  modeling.  The  steps  we  followed  can  be  seen  in 
Figure  4.  From  the  new  output  table,  we  added  a  partition 
set.  A  Partition  node  adds  an  index  for  the  machine 
process.  These  indexes  are  randomly  assigned  to  each  data 


point,  typed 

as 

either  "training" , 

testing" , 

or 

"validation" . 

In 

the 

machine  learning 

process , 

the 

algorithms  will 

use 

the 

"training"  data 

to  develop 

the 

model .  The  larger  the  training  set  used,  the  better  the 
model  fits  the  existing  data,  but  using  too  much  training 
data  might  result  in  a  model  that  over-fits  the  data. 
Because  of  the  sheer  size  of  the  data  set,  we  decided  on  a 
35%  training,  55%  testing  and  10%  validation  split  on  this 
data . 
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The  next  node  is  a  Reclassification  node.  This  took 
loss  data  from  the  "Title"  field  and  created  a  flag  field  to 
identify  losses  without  type. 

The  Binning  Node  was  used  to  create  a  binned  field 
based  on  "CIVO."  The  Binning  node  automatically  created  a 
Derive  node  to  implement  the  bins.  The  bin  sizes  were  based 
on  bins  used  by  Army  G-l  and  contained  in  the  TAPDB-R  data 
reference  guide.  The  Filler  node  was  to  complete  this 
binning,  as  blank  fields  needed  to  be  binned  into  "OTH." 

The  next  three  Derive  nodes  provide  true/false  flag 
indicators  to  create  the  fields  "Profile",  "Mobilized"  and 
"Deployed."  The  next  Derive  node  provides  the  field 
"TISatLOSS,"  which  is  the  number  of  years  in  service  that 
any  losing  service  member  had  when  they  left.  "RRC"  is  the 
two  digit  lead  for  the  major  command  of  the  individual. 
"CMF"  is  the  first  two  digits  of  the  service  member's 
primary  MOS . 

The  first  Filler  node  places  an  H  in  all  blank  DEPPER 
fields.  Any  blanks  indicated  an  MPAOI  of  H,  rather  than  V, 
meaning  they  entered  through  the  Delayed  Training  Program 
(DTP) ,  instead  of  the  Delayed  Entry  Program  (DEP)  .  The 
second  filler  placed  an  M  (default)  in  blank  entries  of  the 
HT_WT_IND  data  field. 

One  final  Reclassification  node,  entitled  Destination, 
created  a  more  generalized  output  field  for  analysis,  based 
on  some  initial  outcomes  from  test  runs.  This  final 
reclassification  allowed  me  to  type  individuals  based  on 
"TPU"  (Those  who  have  stayed),  "OUT"  (Individuals  who  left 
to  a  less  ready  status,  such  as  the  IRR  or  had  a  complete 
break  from  service),  "MILITARY"  (Individuals  who  went  to 
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another  form  of  service;  this  includes  IMA,  AGR,  RA,  Service 
Academy,  or  other  branch  of  service) ,  and  "RETIRED"  (those 
who  completed  full  terms  of  service  and  were  retired  in  one 
form  or  another) . 

As  a  last  node  for  data  manipulation,  we  added  a  Select 
node.  This  node  was  set  to  look  at  soldiers  who  enlisted 
since  9/11.  We  set  this  as  an  adjunct  to  run  models  on  that 
special  part  of  the  population. 


Figure  4.  Data  Manipulation  in  the  Clementine  Model. 


3.  Data  Classification 

Type  nodes  are  used  to  set  the  role  of  the  data  for  use 
by  the  models.  These  nodes  are  used  to  specify  field 
metadata  and  properties.  These  data  include  type,  label, 
direction,  and  values.  They  are  the  final  control  to  set 
the  input  of  the  data  into  the  model  (SPSS  Inc.,  2006) . 

The  Tables  in  appendix  A  show  the  final  settings  of  the 
Type  nodes.  Grey  fields  were  excluded  from  the  model.  Some 
of  these  fields  are  excluded  for  reasons  discussed  earlier. 
Many  others  were  excluded  because  they  were  too  aggregated 
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and  provided  no  real  information.  All  the  other  output 
fields  were  considered  and  rejected  during  the  modeling 
process.  Finally,  some  were  excluded  because  they  had 
dependent  values  that  allowed  the  model  to  "cheat."  One 
example  was  the  RSC  field.  Because  RSCs  changed  to  RRCs 
during  the  early  part  of  this  time  window,  any  record  with  a 
"RSC"  value  in  the  RSC  field  was  an  obvious  loss, 
corresponding  to  a  soldier  who  had  not  been  in  a  unit  for  at 
least  4  years.  Fields  marked  with  an  asterisk  (*)  were 
possible  output  fields.  The  yellow  field  (Destination)  was 
the  final  output  field. 

C .  MODELING  METHODOLOGY 

We  began  modeling  the  data  using  association  rule 
models.  Association  rules  are  statements  in  the  form  of  "if 
antecedents  then  consequences."  We  chose  this  model  to  find 
hidden  or  unanticipated  associations  in  the  data,  such  as 
fields  that  were  actually  output-dependent.  We  used  the 
Apriori  node  to  look  at  association  rules  in  the  data.  This 
algorithm  was  run  at  various  points  in  the  model  while  using 
either  "Title,"  "Loss,"  or  "Destination"  as  the  output.  The 
settings  were  at  5%  support  with  80%  confidence.  Outputs 
from  the  algorithm  were  used  to  eliminate  some  of  the  fields 
in  the  data  set.  The  RSC  field  was  one  such  example. 
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Destination 


Figure  5.  Apriori  Node  in  Clementine  Model. 


For  this  study,  we  focused  on  decision  tree  models.  A 
decision  tree  allows  us  to  see  how  different  factors  and 
interactions  occur  in  relation  to  the  outcomes  we  are  trying 
to  determine  and  analyze.  There  are  four  types  of  decision 
tree  algorithms  in  Clementine.  They  are  Classification  and 
Regression  (C&R) ,  CHAID,  QUEST,  and  C5 . 0 .  CHAID  and  QUEST 
produced  the  weakest  results  and  were  rejected. 

C&R  and  C5 . 0  both  partition  data  recursively  along 
splits  generated  from  the  outcomes  in  the  training  data. 
The  C&R  tree  finds  an  "impurity  index"  in  the  data  and  looks 
for  the  split  that  provides  the  greatest  reduction  in  that 
index.  The  C&R  tree  will  only  provide  binary  splits.  The 
C5 . 0  tree  works  in  the  same  fashion,  but  adds  a  couple  of 
features.  C5.0  is  not  limited  to  a  binary  split,  and  can 
utilize  "boosting."  "Boosting"  allows  the  algorithm  to 
build  multiple  successive  models.  Each  successive  model 
attempts  to  repair  the  errors  in  the  previous  model,  and 
then  allows  all  of  these  models  to  "vote,"  providing  a 
boosted  prediction  (SPSS  Inc,  2006) .  The  advantage  of  the 
C5 . 0  algorithm  is  better  demonstrated  prediction  at  the  cost 
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of  a  much  more  complicated  tree.  The  C&R  tree  gives  a  much 
easier  to  comprehend  outcome  but  with  a  lower  success  in 


prediction . 
the  Full  set 


Figure  six  displays  the  model  set-up  for  both 
and  the  Junior  Enlisted  set. 
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III.  DISCUSSION  AND  FINDINGS 


A.  PARTIAL  RUNS 

As  discussed  in  the  previous  chapters,  we  conducted 
some  interim  modeling  as  we  went  through  the  development  of 
the  final  model  structure.  We  used  the  Apriori  algorithm 
and  the  C&R  tree  to  look  at  results  along  the  way.  These 
algorithms  ran  in  a  relatively  rapid  manner  (minutes  rather 
than  hours),  and  we  were  able  to  use  them  to  find  input  data 
fields  that  were  dependent  on  response  fields.  These 
dependent  fields  needed  to  be  excluded  from  the  model. 

We  conducted  experimental  runs  with  Neural  Nets,  which 
produced  significantly  lower  prediction  success.  This  may 
have  been  a  function  of  not  allowing  the  runs  to  complete, 
but  after  19  hours,  the  most  successful  model  was  still  only 
at  approximately  58%  prediction  accuracy. 

B.  C&R  TREE  OUTCOMES 

1.  Full  Data  Set 

The  following  graph  is  the  final  decision  tree  produced 
for  the  entire  data  set,  using  the  C&R  algorithm.  As  an 
example,  the  algorithm  found  the  biggest  split  reduction  at 
a  total  bonus  amount  of  $2985.  Going  down  the  right  side  of 
the  tree,  those  with  a  bonus  greater  than  $2985  are  checked 
for  DMOSQ.  Those  with  a  DMOSQ  of  A  or  X  are  predicted  to 
get  out.  This  makes  sense.  A  person  with  an  A  or  X  has 
usually  failed  to  complete  training  and  will  most  likely  not 
receive  his  or  her  bonus.  Of  those  with  the  other  DMOSQ 
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codes 


the  model  then  makes  cuts  based  on  Family  Care  Plan 


status,  time  in  the  reserves,  and  deployment  status.  Notice 
that  many  splits  do  not  change  the  decision  of  the  model, 
but  do  modify  the  confidence  of  the  results. 
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Figure  7 


C&R  Decision  Tree  (Entire  Population) 
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This  model  provides  the  following  results 


'Partition' 

Correct 

Wrong 

Total 


1_T  raining 

74,545  70.99% 
30,470  29.01% 
105,015 

Table  1 . 


2_Testing 

148,420  70.95% 
60,756  29.05% 
209,176 

C&R  Results 


3_Validation 

24,597  70.81% 

10,142  29.19% 
34,739 


The  71%  accuracy  should  be  compared  to  the  naive  model, 
which  simply  predicts  the  most  common  outcome  for  every 
observation.  In  this  case,  the  naive  model  would  predict 
TPU  for  all  cases  and  be  correct  52%  of  the  time.  Notice 
that  this  model  never  makes  a  prediction  of  Military,  but 
appears  to  be  splitting  that  category  between  the  Out  and 
TPU  categories.  Most  critical,  those  who  actually  remained 
in  the  TPU  were  correctly  predicted  to  do  so  about  88%. 


'Partition'  =  1_Training 

Out 

Retired 

TPU 

Military 

7,499 

152 

5,569 

Out 

22,918 

767 

8,781 

Retired 

360 

3,438 

1,134 

TPU 

4,868 

1,340 

48,189 

'Partition'  =  2_Testing 

Out 

Retired 

TPU 

Military 

14,796 

319 

11,044 

Out 

46,167 

1,459 

17,769 

Retired 

768 

6,580 

2,166 

TPU 

9,872 

2,563 

95,673 

'Partition'  =  3_Validation 

Out 

Retired 

TPU 

Military 

2,437 

60 

1,843 

Out 

7,543 

247 

2,884 

Retired 

124 

1,057 

351 

TPU 

1,752 

444 

15,997 

Table  2.  C&R  Predictions  vs.  Outcomes 
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2. 


Junior  Soldier  Data  Set 


The  first  model  generated  for  the  junior  soldier  set 
(Figure  8)  was  accurate  but  uninformative.  It  tells  us  that 
not  completing  training  is  the  factor  most  correlated  with 
loss  in  this  junior  enlisted  set.  The  Army  Reserve  has 
already  recognized  this  issue.  This  was  one  of  the  main 
reasons  for  implementing  the  DEP.  A  large  percentage  of  the 
soldiers  entering  into  the  Army  Reserve  never  complete 
training  and  are  never  a  viable  asset  to  the  Army.  The  DEP 
was  meant  to  prevent  these  undeployable  assets  from  being 
accessed  into  the  Force  until  they  leave  for  training. 
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Figure  8.  C&R  Decision  Tree  (Junior  Soldiers) 

We  reran  the  model  by  excluding  this  factor  from  this 
data  set.  This  produced  the  following  tree  (figure  9)  . 
This  provided  further  insight  into  other  factors  affecting 
attrition.  The  prediction  results  for  this  tree  were 
slightly  less  accurate  than  those  of  the  previous  tree. 
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$R-Destination 


Node  0 

Category  %  n 

■  Military 

10.182  3396 

■  Retired 

0.054  18 

■  TPU 

58.418  19484 

■  Out 

31.347  10455 

Total 

100.000  33353 

T— g. 


ALTX 


<=  0.500 


Node  1 

Category  % 

n 

■  Military 

11.417 

1228 

■  Retired 

0.037 

4 

■TPU 

30.216 

3250 

■  Out 

58.330 

6274 

Total 

32.249  10756 

DtENTRES 


<=  2004.500  >  2004.500 


Node  3 

Category  % 

n 

■  Military 

6.080 

186 

■  Retired 

0.033 

1 

■TPU 

91.697 

2805 

■  Out 

2.190 

67 

Total 

9.172 

3059 

Node  2 

Category  % 

n 

■  Military 

13.538 

1042 

■  Retired 

0.039 

3 

■  TPU 

5.781 

445 

■  Out 

80.642 

6207 

Total 

23.077 

7697 

>0.500 


Node  4 

Category 

% 

n 

■  Military 

9.594 

2168 

■  Retired 

0.062 

14 

■TPU 

71.841 

16234 

■  Out 

18.502 

4181 

Total 

67.751 

22597 

E 

DtLRA 


<=  2004.500  >  2004.500 


CMXT 


<=21.000 


Node  6 

Category  % 

n 

■  Military 

14.519 

1006 

■  Retired 

0.144 

10 

■  TPU 

60.846 

4216 

■  Out 

24.491 

1697 

Total 

20.775 

6929 

>21.000 


Node  7 

Category  % 

n 

■  Military 

9.610 

1094 

■  Retired 

0.035 

4 

■TPU 

69.413 

7902 

■  Out 

20.942 

2384 

Total 

34.132  11384 

Total  Of  Amount 


<=  2958.500 


Node  8 

Categoiy 

% 

n 

■  Military 

22.657 

336 

■  Retired 

0.270 

4 

■TPU 

49.225 

730 

■  Out 

27.849 

413 

Total 

4.446 

1483 

B 

AMT05 


>  2958.500 


Nodell 

Category  % 

n 

■  Military 

7.656 

758 

■  Retired 

0.000 

0 

■  TPU 

72.437 

7172 

■  Out 

19.907 

1971 

Total 

29.685 

9901 

MILSPI 


<=1125.000  >  1125.000  <blank>  N;  Y 


Second  C&R  Decision  Tree  (Junior  Soldiers) 
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Figure  9 


The  results  for  this  tree  are  as  follows  (Table  3)  . 
The  predicted  accuracy  of  this  tree  was  less  than  one 
percent  lower  than  that  of  the  first  tree.  The  72%  accuracy 
found  here  should  be  compared  to  the  58%  accuracy  of  the 
naive  model . 


'Partition'  1_Training 
Correct  24,173 

Wrong  9,180 

Total  33,353 


2_Testing 
72.48%  48,191 

27.52%  18,812 

67,003 


3_Validation 

71.92%  8,096  72.47% 

28.08%  3,076  27.53% 

11,172 


Table  3.  Junior  Soldier  C&R  Results 


This  model,  looking  at  just  junior  soldiers,  would  (and 
should)  never  predict  a  retirement.  The  retirements  that 
are  in  this  data  set  are  medical  in  nature.  Looking  at 
Table  4,  we  see  that  this  model  does  a  very  good  job  of 
predicting  those  who  stayed  in  the  TPU,  but  noticeably  worse 
in  predicting  who  would  get  out.  The  real  gain  in  this 
model,  though,  is  that  underlying  this  is  data  on  how  strong 
the  prediction  is.  The  decision  tree  shows  percentages  at 
the  end  nodes.  These  can  be  interpreted  as  predicted 
conditional  probabilities  to  build  specific  prediction 
models  on  how  many  people  we  are  at  risk  of  being  lost, 
based  on  the  composition  of  the  force.  In  this  way,  we  are 
able  to  classify  individuals  by  their  level  of  risk.  Table 
4  breaks  out  the  predictions  for  this  model  against  actual 
performance  for  the  three  partitioned  sets. 
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'Partition'  =  1_Training 

Out 

TPU 

Military 

1,040 

2,356 

Out 

5,471 

4,984 

Retired 

2 

16 

TPU 

782 

18,702 

'Partition'  =  2_Testing 

Out 

TPU 

Military 

2,006 

4,838 

Out 

11,035 

10,298 

Retired 

5 

20 

TPU 

1,645 

37,156 

'Partition'  =  3_Validation 

Out 

TPU 

Military 

329 

816 

Out 

1,798 

1,641 

Retired 

2 

6 

TPU 

282 

6,298 

Table  4.  Jr.  Soldier  C&R  Predictions  vs.  Outcomes 


C.  C5.0  TREE  OUTCOMES 

The  trees  generated  from  the  C5 . 0  algorithm  are  much 
larger  and  much  more  complicated  than  those  of  the  C&R 
algorithm.  Conversely,  they  were  also  significantly  more 
informative  and  had  better  prediction  outcomes.  Because  of 
their  complexity,  we  are  unable  to  display  the  decision 
trees  in  this  document. 

1.  Full  Data  Set 

The  full  set  model  used  8  trees  to  boost  its  results 
and  provided  up  to  14  levels  of  significant  factors  in  each 
tree.  The  most  prevalent  factors  correlated  with  attrition 
included  time  in  service,  unit,  the  delayed  entry  program, 
dependency,  and  education.  Many  other  factors  appeared  but 
were  much  less  prevalent.  The  prevalence  of  a  factor  was 
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determined  by  the  number  of  times  it  appeared  and  how  high 
in  the  tree  it  appeared.  The  results  for  this  model  are 
listed  below. 


'Partition' 

Correct 

Wrong 

Total 


1_Training  2_Testing 


3  Validation 


83,982  79.97%  164,465  78.63% 
21,033  20.03%  44,711  21.37% 
105,015  209,176 

Table  5 .  C5 . 0  Results 


27,348 

7,391 

34,739 


78.72% 

21.28% 


The  data  in  these  tables  shows  the  improved  performance 
of  the  C5 . 0  over  the  C&R  tree  for  modeling  this  data. 


'Partition'  =  1_T raining 

Military 

Out 

Retired 

TPU 

Military 

2,867 

7,378 

223 

2,752 

Out 

1,048 

26,100 

836 

4,482 

Retired 

3 

295 

4,006 

628 

TPU 

325 

2,711 

352 

51,009 

'Partition'  =  2_Testing 

Military 

Out 

Retired 

TPU 

Military 

4,956 

14,934 

469 

5,800 

Out 

2,447 

51,710 

1,670 

9,568 

Retired 

14 

678 

7,407 

1,415 

TPU 

798 

6,153 

765 

100,392 

'Partition'  =  3_Validation 

Military 

Out 

Retired 

TPU 

Military 

849 

2,462 

80 

949 

Out 

417 

8,401 

285 

1,571 

Retired 

3 

107 

1,189 

233 

TPU 

155 

1,005 

124 

16,909 

Table  6.  C5.0 

Prediction  ■ 

vs.  Outcomes 

2 .  Junior  Soldier  Data  Set 

The  Junior  Enlisted  set  model  used  10  trees  to  boost 
its  results  and  provided  up  to  16  levels  of  significant 
factors  in  each  tree.  The  most  prevalent  factors  included 
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unit,  delayed  entry  program,  DMOSQ,  and  marital  status. 
Many  other  factors  appeared  but  were  much  less  prevalent. 
The  results  for  this  model  are  listed  below. 


Results  for  output  field  Destination 

'Partition'  1_Training  2_Testing  3_Validation 

Correct  29,657  88.92%  57,049  85.14%  9,537  85.37% 

Wrong  3,696  11.08%  9,954  14.86%  1,635  14.63% 

Total  33,353  67,003  11,172 

Table  7.  Jr.  Soldier  C5 . 0  Results 


'Partition' 

=  1_T  raining 

Military 

Out 

Retired 

TPU 

Military 

965 

1,886 

0 

545 

Out 

117 

9,777 

0 

561 

Retired 

1 

6 

2 

9 

TPU 

63 

508 

0 

18,913 

'Partition' 

=  2_Testing 

Military 

Out 

Retired 

TPU 

Military 

950 

4,403 

0 

1,491 

Out 

729 

18,948 

0 

1,656 

Retired 

4 

13 

0 

8 

TPU 

204 

1,445 

1 

37,151 

'Partition' 

=  3_Validation 

Military 

Out 

Retired 

TPU 

Military 

170 

730 

0 

245 

Out 

131 

3,050 

0 

258 

Retired 

0 

6 

0 

2 

TPU 

22 

241 

0 

6,317 

Table  8.  Jr.  Soldier  C5 . 0  Prediction  vs.  Outcomes 
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IV.  CONCLUSIONS  AND  RECOMMENDATIONS 


This  study  can  provide  the  framework  to  support  the 
type  of  Markov  process  prediction  model  that  the  authors  of 
the  "Patterns  of  Reserve  Attrition"  study  (Dolf ini-Reed, 
2005)  talked  about.  Additionally,  because  it  provides 
information  about  individuals,  this  process  can  be  used  for 
developing  a  retention  tool,  in  order  to  help  Retention  NCOs 
and  Commanders  better  identify  who  may  be  at  risk  and  focus 
limited  resources  toward  maintaining  the  force. 

There  is  not  as  much  insight  into  factors  as  one  would 
have  hoped.  Time  in  service  was,  unsurprisingly,  the 
largest  factor  in  determining  attrition  behavior.  Every 
model  we  generated  using  the  full  data  set  found  that 
behavior  seemed  to  change  at  12  to  14  and  again  at  18  years 
of  service.  The  C5.0  model  found  a  distinct  and  positive 
split  around  the  Delayed  Entry  Program.  Bonus  amounts  at 
$2985  and  $3500  were  also  significant  in  both  sets  of  data. 
MGIB  data  was  an  almost  non-existent  factor  in  any  of  the 
trees.  On  reflection,  we  believe  this  may  actually  be  a 
function  of  not  having  the  right  data  for  this  field.  What 
may  be  better  data  is  who  is  enrolled  in  the  program,  rather 
than  who  is  receiving  benefits.  During  our  time  with 
Recruiting  Command,  it  was  a  commonly  held  belief  that  the 
MGIB  was  the  most  cost-efficient  of  any  of  the  available 
benefits,  as  a  large  number  of  people  signed  up  because  of 
it,  but  a  much  smaller  number  of  people  actually  ever  used 
it.  Surprisingly,  the  mobilization,  combat,  or  other  "go  to 
war"  indicators  seemed  to  be  insignificant  factors  in 
attrition.  When  they  did  show  up  in  a  model,  they  displayed 
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a  positive  trend  with  increased  usage.  This  supports  the 
trends  shown  in  the  other  studies  considered.  Therefore,  it 
may  be  tempting  to  infer  that  the  war  is  not  having  a 
negative  impact  on  the  decision  of  individuals  to  remain 
serving  in  the  Army  Reserve.  The  generally  negative 
outcomes  from  post-mobilization  surveys  and  the  downward 
trend  in  propensity  to  join  the  military  that  the  Army  seems 
to  be  experiencing  (personal  communications,  2006)  would 
seem  to  indicate,  however,  that  the  potential  for  attrition 
should  increase  as  well.  The  data  from  this  study  provides 
some  support  to  the  conclusions  of  Hosek  and  Totten  (2002), 
which  hypothesized  that  deployments  helped  to  vest  interest 
in  service  and  reduced  naivete  in  military  service. 
Although  this  data  neither  supports  nor  refutes  that  claim, 
our  theory  is  that  the  Army  is  currently  and  has  in  the 
past,  proactively  attacked  what  was  seen  as  a  potential 
manpower  problem  and  effectively  eliminated  it. 

We  believe  there  is  significant  potential  for  expansion 
and  follow  on  work  from  this  thesis.  Conducting  similar 
studies  against  the  remainder  of  the  Select  Reserve  should 
be  equally  informative.  Further  regression  models  might 
predict  time  in  service  when  a  loss  occurs.  Another 
critical  step  in  developing  the  proposed  model  in  the 
Dolfini-Reed  paper  would  be  to  conduct  time  series  analysis 
with  these  data  for  support  the  Markov  process  model. 

The  models  we  did  build  showed  significant  potential 
for  predicting  behavior.  We  believe  that  this  process 
should  be  continued  and  expanded  to  a  tool  to  aid  in  and 
affect  attrition.  We  envision  a  system  in  which  data  on  the 
service  member  along  with  responses  to  simple  questions 
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filled  in  through  Army  Knowledge  Online  (AKO)  or  Human 
Resources  Command  could  be  used  to  focus  resources  and 
assist  Retention  Specialists  in  retaining  the  right  people. 
Additionally,  we  encountered  difficulty  with  obtaining  data. 
In  some  cases,  parochialism,  proprietary  attitudes,  and 
"stovepiping"  prevent  data  from  being  available  to  the 
analytic  cells  we  worked  with  in  the  Army  Reserve.  Some  of 
the  data  manipulations  in  this  study  had  never  been  done 
before,  because  of  just  these  problems.  There  are  many 
other  data  sets  out  there  that  could  possibly  improve  on 
this  study  as  well.  Some  data  sets  we  know  to  exist  but 
were  unable  to  use  include  tuition  assistance  and  retirement 
points . 

A  final  recommendation  would  be  to  set  up  a  data 
warehouse  for  the  Army  Reserve  analytical  cells.  This  might 
perhaps  be  controlled  by  OCAR-PAE  or  USARC,  to  act  as  single 
source  of  study  data  for  the  Army  Reserve.  One  possible 
solution  would  be  to  integrate  these  other  data  into  TAPDB-R 
or  RCMS . 
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APPENDIX  A  DATA  DICTIONARY 


Table  9 

DATA  REFERENCE 
NAME 

.  Data  Dictionary 

OWNING 

DBF  NAME 

DESCRIPTION  1 

JSARC/USAR 

AGENCY 

SSN 

SOCIAL  SECURITY  A  soldier's  Social  Security  Number 

DCSPER 

NUMBER 

UIC 

CURRENT  UIC 

The  Unit  Identification  Code  that  identifies  the 
organizational  assignment  for  a  soldier 

DIRFP 

RSN_FLAG 

SUSP  FAVOR  PERSThe  reason  for  suspending  favorable  personnel  actions 

DCSPER 

ACTION  REASON 

(flag)  for  a  soldier 

DtF 

DATE  SUSP  FAVOR  Year  of  suspension  of  favorable  personnel  actions  (flag) 

DCSPER 

PERS  ACTION 

FORMAT:  CCYY 

NAME 

NAME  INDIVIDUAL 

A  soldier's  full  name 

DCSPER 

MIL_PER_CA 

MILITARY 

The  categories  into  which  the  soldiers  of  the  Armed  Forces 

DCSPER 

PERSONNEL  CLASS 

are  divided  based  upon  their  grade  and  status 
(Commissioned  Officer  (CO),  Warrant  Officer  (WO), 
Enlisted  (ENLD),  Academy  Cadet)  (Limited  to  E  only) 

DtBiY 

DATE  OF  BIRTH 

The  year  a  soldier  was  born  FORMAT:  CCYY 

DCSPER 

HIGH 

HEIGHT  INDIVIDUAL 

A  soldier's  actual  height,  in  inches,  as  indicated  during  the 
most  current  weigh-in  or  medical  examination 

SURGEON 

WGNT 

WEIGHT  INDIVIDUAL 

A  soldier's  actual  weight,  in  pounds,  as  indicated  during  the 
most  current  weigh-in  or  medical  examination 

SURGEON 

HT_WT_IND 

HGT/WT 

Indicates  whether  or  not  a  soldier’s  weight  is  acceptable  for 

SURGEON 

ACCEPTABILITY  INDIC  the  soldier’s  height,  even  if  not  within  the  Army  prescribed 
limits 

SEX 

SEX 

The  sex  of  a  soldier 

SURGEON 

ETH_GP 

ETHNIC  GROUP 

A  soldier's  ethnic  group  (A  segment  of  the  population  that 
possesses  common  characteristics  and  a  cultural  heritage 
significantly  different  from  that  of  the  general  U.S. 
population  and  closely  identifies  with  that  cultural  heritage) 

DCSPER 

RACE 

RACE/POPULATION 

A  soldier's  race  (A  division  of  the  human  population  having 

DCSPER 

GROUP 

descent  or  origin  in  particular  peoples  or  racial  groups) 

CITZ 

CITIZENSHIP  STATUS  The  legal  (statutory)  origin  of  a  soldier's  United  States 

DCSPER 

citizenship  status 

MAST 

MARITAL  STATUS 

A  soldier's  legal  marital  status 

DCSPER 

DEPN 

NUMBER  OF  The  number  of  dependents  for  a  soldier  (Dependents: 

DCSPER 

DEPENDENTS 

Persons  for  whom  the  sponsor  (normally  the  head  of  a 
household)  provides  support  in  accordance  with  the 
provisions  of  DOD  Military  Pay  and  Allowance  Entitlements 
manual) 

RELI 

RELIGIOUS 

A  soldier's  religious  denomination  (A  sect  or  group  of 

CHAPLAIN 

DENOMINATION 

individuals  with  similar  theological  beliefs) 

CIVO 

CIVILIAN 

The  general  category  into  which  a  soldier's  civilian 

DCSPER 

OCCUPATIONAL 

occupation  is  classified  based  on  the  type  of  work 

CATEGORY 

performed 

STREET 

STREET  ADDRESS 

The  street  address  portion  of  a  soldier's  address 

CIO 

CITY 

ADDRESS  CITY 

The  name  of  the  city  in  a  soldier's  address 

CIO 

STATE 

STATES/TERRITORIES  The  name  of  the  state  in  a  soldier's  address 

CIO 
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DBF  NAME 


DATA  REFERENCE 
NAME 


DESCRIPTION 


OWNING 

USARC/USAR 

AGENCY 


OF  US 


ZIP 

ZGLC 

ZIP  CODE  A  5-/9-digit  zip  code  in  a  soldier's  address 

GRID  LOCATOR  CODE  A  code  denoting  a  specific  geographic  location  within  the 
boundaries  of  the  Continental  US;  developed  by  using 
WIAP-Z  to  divide  the  Continental  US  into  quadrants  15 
miles  square  (000000  -  995995) 

CIO 

SYS  GEN 

GRAS 

GRADE 

SERVICE 

ARMED  A  soldier's  grade  (A  rating  in  a  graduated  progression  of 
ratings  in  an  Armed  Service;  this  rating  is  equal  to  a  grade 
level  or  is  in  a  relative  position  between  grade  levels  within 
the  United  States  hierarchy  of  grades) 

SYS  GEN 

GRADE 

GRADE  TITLE  -  USThe  3-character  abbreviation  of  the  rank  a  soldier  holds  in 
ARMY  the  United  States  Army  (COL,  CPT,  CW3,  SGT,  PV1) 

DCSPER 

DtR 

DATE  OF 
RESERVE 

RANK  -  The  year  a  soldier's  rank  in  the  reserves  became  effective  - 
This  date  establishes  the  relative  seniority  of  a  soldier 
among  others  who  possess  the  same  Reserve  military 
grade  FORMAT:  CCYY 

DCSPER 

DtPEBD  PAY  ENTRY  BASIC  The  date  that  establishes  the  beginning  of  a  soldier's  DCSCOMPT 

DATE  creditable  service  for  pay  purposes  (Equals  the  date  of 

enlistment  for  NPS  gains)  FORMAT:  CCYY 

DtEXP  EXPIRATION  The  date  when  a  soldier  has  completed  or  will  complete  a  OCAR  RTD 

STATUTORY  MIL  period  of  service  required  by  statute  (The  initial  period  of 
OBLG  DATE  service,  active  or  reserve,  required  by  statute  is  8  years) 

FORMAT:  CCYY 

DtTPUEXP  EXPIRATION  OF  TPU  The  date  indicating  the  expiration  of  the  period  a  soldier  is  OCAR  RTD 
SERVICE  DATE  currently  obligated  or  expected  to  serve  as  a  member  of 
the  Selected  Reserve  with  either  a  Reserve  unit  (TPU)  or 
on  an  active  duty  tour  (AGR)  FORMAT:  CCYY 

DtLRA  DATE  LAST  The  date  a  soldier  last  completed  a  period  of  active  duty  or  DCSPER 

RELEASED  ACTIVE  active  duty  for  training  (Non-TPU  training  AD)  FORMAT: 

DUTY  CCYY 


ACT_FEDSVC  NUMBER  MONTHS  A  soldier's  cumulative,  creditable  period  of  full-time  active  DCSPER 

ACT  FED  SVC  duty,  expressed  in  30-day  increments  (Includes  periods  of 

AT,  ADT,  ADSW,  IADT,  etc.) 

PPSC  PHYSICAL  PROFILE  (PULHES)- An  estimate  of  the  overall  ability  of  a  soldier  to  SURGEON 

SERIAL  perform  military  duties  by  consideration  of  the  physical  and 

mental  condition  (PULHES  consists  of  six  numbers  -  each 
from  1-4  -  indicating  a  rating  for  the  soldier  in  each  of  the 
following  categories:  Physical  Capacity  Indicator  (P), 

Upper  Extremities  Capacity  Indicator  (U),  Lower 
Extremities  Capacity  Indicator  (L),  Hearing/Ears  Capacity 
Indicator  (H),  EyesA/ision  Capacity  Indicator  (E), 

Psychiatric  Capacity  Indicator  (S)  in  that  sequence  - 
(Example:  111111  indicates  no  limitations  in  any  category) 

PHCC  PHYSICAL  CATEGORY  Represents  certain  combinations  of  physical  profile  serial  SURGEON 

codes  (PULHES)  and  the  most  significant  duty  limitations 

APFTJND  APRT  INDICATOR  Designates  that  a  soldier  passed  or  failed  the  last  DCSOPS 

performance  of  the  Army  Physical  Readiness  Test 

DEPL  PERS  DEPLOYABILITY  The  most  significant  factor  which  precludes  the  overseas  DCSPER 

LIMITATION  assignment  of  a  soldier  during  full  mobilization 

MILED_COMP  MIL  EDUCThe  completion  status  of  a  soldier's  military  professional  DCSOPS 
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DBF  NAME 

CIED 

CVEL 

MSCE 

DtENTRY 

DtENTRES 


AFSG 

AFQT 

DtETS 

NEXE 

CMXT 

PMOS 

SKLVL 

SMOS 

AMOS 


DATA  REFERENCE 
NAME 

COMPLETED  STATUS  training 


DESCRIPTION 


OWNING 

USARC/USAR 

AGENCY 


CIVILIAN  EDUC  CERT  The  highest  level  of  formal  academic  education,  in  DCSOPS 

COMPLETED  approved  program  of  study  at  a  non-military  institution  or 

service  academy,  attained  by  a  soldier  (Completion  should 
be  recognized  or  certified  by  a  diploma,  degree,  document, 
or  other  certificate) 

CIVILIAN  EDUCATION  The  highest  level  of  formal  academic  (non-military)  DCSOPS 

LEVEL  education  obtained  by  a  soldier 

MAJOR  SUBJECT  A  soldier's  major  field  of  study  for  the  highest  civilian  DCSPER 

COLLEGE  EDUC  education  attained 

YR-MO  INITIAL  ENTRY  The  year  and  month  a  soldier  was  first  commissioned  or  DCSPER 
MIL  SVC  enlisted  in  any  military  service  of  the  United  States  (Active 

or  Reserve)  -  This  date  is  fixed  and  is  not  adjusted  for 
breaks  in  service  FORMAT:  CCYY 

YR-MO  INITIAL  ENTRY  The  year  and  month  a  soldier  affiliates  or  enlists  in  any  DCSPER 
RES  Reserve  component  (non-EAD)  for  the  first  time  -  This  year 

and  month  is  fixed  and  would  not  be  adjusted  for  breaks  in 
Reserve  service  (For  non-prior  service  members,  this  year 
and  month  would  equal  the  year  and  month  of  initial  entry 
military  service  -  often  blank  for  pre-reservist  if  not  entered 
from  OMPF)  FORMAT:  CCYYMM 

AFQT  SCORE  The  aggregated  percentile  test  score  group  into  which  a  DCSPER 

GROUPS  soldier's  score  on  the  Armed  Forces  Qualification  Test  falls 

AFQT  PERCENTILEThe  percentile  score  attained  by  an  examinee  on  the  Amed  DCSPER 
SCORE  Forces  Qualification  Test 

EXPN  READY  A  date  indicating  the  expiration  of  the  period  an  enlisted  OCAR  RTD 

RESERVE  OBLG  DATE  soldier  is  required  by  law  or  contractual  agreement  to 
serve  as  a  member  of  the  Ready  Reserve  (TPU,  AGR, 

Control  Group)  FORMAT:  CCYY 

NBR  OF  ENLISTMENTThe  number  of  extensions  associated  with  a  soldier's  OCAR  RTD 
EXTENSIONS  current  enlistment 

CUMULATIVE  The  total  (cumulative)  number  of  months  a  soldier  has  OCAR  RTD 

MONTHS  EXTENSION  extended  his/her  current  ready  reserve  obligation 
PRIMARY  MOSD  -The  Military  Occupational  Specialty  Designator  (MOSD)  of  DCSPER 
ENLISTED  an  enlisted  soldier  that  is  of  first  significance  to  the  Army  in 

terms  of  training,  experience,  demonstrated  qualifications, 
and  Army  needs 

SKILL  LEVEL  Level  of  proficiency  required  for  performance  of  a  specific  DCSPER 

military  job,  and  the  level  of  proficiency  at  which  a  soldier 
qualifies  in  the  Military  Occupational  Specialty  (MOS)  (The 
4th  character  in  a  Primary  MOSD  -  Enlisted) 

SECONDARY  MOSD  -Identifies  a  Military  Occupational  Specialty  Designator  DCSPER 
ENLISTED  (MOSD)  of  an  enlisted  soldier  that  is  next  in  significance  to 

the  primary  MOSD  -  Enlisted 

ADDITIONAL  MOSD  -Designates  a  Military  Occupational  Specialty  Designator  DCSPER 
ENLISTED  (MOSD)  that  is  in  addition  to  the  primary  and  secondary 

MOSDs 
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DBF  NAME 

AS  I 

GOOD_YRSVC 

DSSI 

UCAG 

FCPSCD 

FCPSDT 

SOPTDD 

MILSPI 

MUSARC 

DtADTE 

ORIG 

DMOSQ 

UNITNAME 

TIER 

MSCNAME 

RSC 

PRI 

PRIX 


DATA  REFERENCE 
NAME 


DESCRIPTION 


AS  I  -  COMMISSIONED  Commissioned  Officer  (CO)  -  An  Additional  Skill  Identifier 
OFFICER  (ASI)  indicating  a  specialized  skill  that  is  required  to 

ASI  -  ENLISTED  perform  the  duties  of  a  position  but  is  not  necessarily 
ASI  -  WARRANT  related  to  any  one  particular  specialty 
OFFICER  Enlisted  (ENLD)  -  An  Additional  Skill  Identifier  (ASI) 

indicating  a  specialized  skill  closely  related  to  or  an  adjunct 
to  that  required  by  an  enlisted  MOS 
Warrant  Officer  (WO)  -  An  Additional  Skill  Identifier  (ASI) 
indicating  a  specialized  skill  or  equipment  unique  to  a 
position  to  identify  those  qualified  for  a  position 
TOTAL  The  number  of  years  of  military  service  that  a  soldier  is 

SATISFACTORY  credited  with  having  served  that  are  acceptable  for 
YEARS  RET  retirement  purposes 

DUTY  POSD  Specifies  the  duty  that  a  soldier  is  actually  performing 

(Consists  of  the  soldier's  MOS,  a  First  Duty  ASI,  and  either 
a  Second  Duty  ASI  or  a  Duty  Language  Identifier 
USAR  COMMAND  OF  An  organization  in  the  United  States  Army  Reserve  that  is 
ASSIGNMENT  normally  commanded  by  a  General  Officer  and  responsible 

for  units  within  its  command  structure  or  within  a  specified 
geographical  boundary 

INDIVIDUAL  FAMILY  Indicates  the  status  of  the  arrangements  required  of  sole 
CARE  PLAN  parents  or  military  couples  to  provide  for  their  dependents 

while  involved  in  wartime  duties 

FAMILY  CARE  PLAN  The  most  recent  date  a  Family  Care  Plan  was  submitted 
SUBMISSION  DATE 

SOLE  PARENT  Designates  a  soldier  as  the  sole  parent  of  a  dependent 

DEPENDENT  DESIGN 

MILITARY  SPOUSE  Indicates  that  a  soldier's  spouse  is  also  in  the  military 
INDICATOR 

MAJOR  USAR  A  Reserve  Command  directly  subordinate  to,  and 

COMMAND  ASG  constituting  a  major  mission  element  of,  a  major  Army 
subcommand  (A  numeric  1st  position  indicates  the  US 
Army) 

ACCESSION  DATE  Actual  date  a  soldier  was  gained  into  the  current  reserve 
component  category  FORMAT:  CCYY 

ORIGINATOR  CODE  A  code  to  uniquely  identify  each  originator  submitting  data 
to  the  system  (Consists  of  the  Data  Entry  Point  (MUSARC 
code)  +  the  Originator  Designator  (specific  office  w/in  an 
agency)  +  the  Data  Entry  Clerk  (specific  user  id)) 

DUTY  QUALIFICATION  A  code  indicating  the  Commander's  evaluation  of  the 
CODE  ability  of  the  soldier's  qualification  to  perform  the  duties  of 

the  assigned  position  as  defined  by  AR 140-185,  Table  1-1 
UNIT  NAME  The  name  of  the  unit  to  which  a  soldier  is  assigned 

Unit  Priority 
MSC  Name 
RSC  Name 

FIRST  3  CHAR  OF  SMG18CWE.DBF  ONLY  -  FIRST  3  CHAR  OF  SM  PRIMARY 
PRIMARY  SPEC  SPEC 

FOURTH  CHAR  OF  SMG18CWE.DBF  ONLY  -  FOURTH  CHAR  OF  SM  PRIMARY 


OWNING 

USARC/USAR 

AGENCY 

DCSPER 


DCSPER 


DIRFP 

DIRFP 


DCSPER 

DCSPER 

DCSPER 

DCSPER 

DIRFP 

DCSPER 

DCSPER 

DCSOPS 


DIRFP 

DCSOPS 

DIRFP 

DIRFP 

DCSPER 

DCSPER 
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DBF  NAME 


DATA  REFERENCE 
NAME 


DESCRIPTION 


OWNING 

USARC/USAR 

AGENCY 


PRIMARY  SPEC  SPEC 


SEC 

FIRST  3  CHAR  OF  SMG18CWE.DBF  ONLY  -  FIRST  3  CHAR  OF  SM  SECOND 
SECOND  SPEC  SPEC 

DCSPER 

SECX 

FOURTH  CHAR  OF  SMG18CWE.DBF  ONLY  -  FOURTH  CHAR  OF  SM  SECOND 
SECOND  SPEC  SPEC 

DCSPER 

ALT 

FIRST  3  CHAR  OF  SM  G18CWE.DBF  ONLY  -  FIRST  3  CHAR  OF  SM  ALT  SPEC 
ALT  SPEC 

DCSPER 

ALTX 

FOURTH  CHAR  OF  SMG18CWE.DBF  ONLY  -  FOURTH  CHAR  OF  SM  ALT 
ALT  SPEC  SPEC 

DCSPER 

ASVABCL 

ASVAB  -  CLERICAL  The  score  earned  by  a  soldier  on  the  Clerical  portion  of  the 
ASVAB 

DCSPER 

ASVABCO 

ASVAB  -  COM  BAT  The  score  earned  by  a  soldier  on  the  Combat  Orientation 

ORIENTATION  portion  of  the  ASVAB 

DCSPER 

ASVABEL 

ASVAB  -  ELECTRICAL  The  score  earned  by  a  soldier  on  the  Electrical  portion  of 
the  ASVAB 

DCSPER 

ASVABFA 

ASVAB  -  FIELD  The  score  earned  by  a  soldier  on  the  Field  Artillery  portion 

ARTILLERY  of  the  ASVAB 

DCSPER 

ASVABOF 

ASVAB  -  FOOD  The  score  earned  by  a  soldier  on  the  Food  Service  portion 
SERVICE  of  the  ASVAB 

DCSPER 

ASVABGT 

ASVAB  -  GENERAL  The  score  earned  by  a  soldier  on  the  General  Technical 
TECHNICAL  portion  of  the  ASVAB 

DCSPER 

ASVABGM 

ASVAB  -  GENERAL  The  score  earned  by  a  soldier  on  the  General 
MAINTENANCE  Maintenance  portion  of  the  ASVAB 

DCSPER 

ASVABMM 

ASVAB  -  MOTOR  The  score  earned  by  a  soldier  on  the  Motor  Maintenance 
MAI  NTENANCE  portion  of  the  ASVAB 

DCSPER 

ASVABSC 

ASVAB  -  SKILL  The  score  earned  by  a  soldier  on  the  Skill  Communications 
COMMUNICATIONS  portion  of  the  ASVAB 

DCSPER 

ASVABST 

ASVAB  -  SKILL  The  score  earned  by  a  soldier  on  the  Skill  Technical 

TECHNICAL  portion  of  the  ASVAB 

DCSPER 

DtEFFDG 

EFFECTIVE  DATE  OF  The  date  a  soldier's  grade  became  effective:  CCYY 

GRADE 

DCSPER 

FirstOfBonus 

N/A  Bonus  Info  from  Recruiting  Command 

USAREC 

*LossTyp 

MPA  Type  from  XTX  Code  describing  Type  of  loss 

XTX 

*LossTypDesc 

Description  of  MPA  Type 

XTX 

LossRsn 

MPA  Reason  from  XTX  Code  describing  Reason  of  loss 

XTX 

*LossRsnDesc 

Description  of  MPA  Reason 

XTX 

*MPAORG 

CurrOrg  for  Loss  Code  describing  Loss  Destination 

XTX 

♦Title 

Description  of  Loss  Destination  Code 

XTX 

♦DtLOSS 

Year  Loss  Occurred 

XTX 

DEPPER 

CurrOrg  orig  IDs  personnel  who  were  in  DEP 

XTX 

LastOfDMOS 

DMOS  for  mobilized  DMOS  for  last  Mobilization 
pers. 

ALLMOB 

SumOfDURATIOIS 

1  #  days  mobilized 

ALLMOB 

LastOfAPC_DESC 

:  Operation  last  mobilized  for 

ALLMOB 

CombatFLG 

Count  of  mobilizations  with  Hostile  Fire  Pay 

ALLMOB 

LastOfUIC 

Last  UIC  mobilized  for 

ALLMOB 

35 


DBF  NAME 

DATA  REFERENCE 
NAME 

DESCRIPTION 

OWNING 

USARC/USAR 

AGENCY 

CNTDPLY 

#  of  times  Mobilized 

ALLMOB 

Total  Of  Amount 

Total  Bonus  Amount  since  FY 1996 

DJMSRC 

AMT02 

Bonus  Amount  for  that  FY 

DJMSRC 

AMT03 

Bonus  Amount  for  that  FY 

DJMSRC 

AMT04 

Bonus  Amount  for  that  FY 

DJMSRC 

AMT05 

Bonus  Amount  for  that  FY 

DJMSRC 

AMT06 

Bonus  Amount  for  that  FY 

DJMSRC 

BasMGIB 

Amount  paid  from  VA  for  ed  benefits 

DMDC 

DtMGIB 

Last  FY  of  Ed  Benefits 

DMDC 

KicMGIB 

Amount  paid  from  VA  for  ed  benefits  ( kicker) 

DMDC 

Table  10.  Data  Dictionary  (Calculated  Fields) 


DBF  NAME 

Partition 

DESCRIPTION 

Random  generated  field  to  separate  training,  testing  and  validating  data 

OWNING 

CALCULATED 

*LOST 

Generated  from  XTX  files  (flag  for  loss) 

CALCULATED 

CIVO_BIN 

BIN  of  civilian  occupations  per  TAPDB-R  Descriptions 

CALCULATED 

Profile 

Flag  to  indicate  a  permanent  profile 

CALCULATED 

Mobilized 

Flag  indicating  mobilization  since  9/11 

CALCULATED 

Deployed 

Flag  indicating  deployment  to  a  warzone  since  9/11 

CALCULATED 

*TISatLOSS 

#  of  Years  of  service  at  time  a  loss  occurred 

CALCULATED 

RRC 

2  digit  indicator  of  RRC 

CALCULATED 

CMF 

2  digit  indicator  of  Career  Management  Field 

CALCULATED 

^Destination 

Calculated  from  MPAORG  for  loss  data  (OUT,  Retired,  Military,  TPU) 

CALCULATED 
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Table  11. 


Data  Audit  of  82  input  fields 
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1 

STATE 

set 

L 

-- 

-- 

110 

I 

GRAS 

d - - ■ 

j  L 

set 

-- 

-- 

22 

B 

GRADE 

IlnJl 

set 

-- 

-- 

22 

B 

DtPEBD 

range 

0 

2006 

1995.443 

9.518 

-53.629 

1 

DtEXP 

range 

1900 

2026 

2002.742 

9.479 

-1.240 

1 

DtTPUEXP 

.  i 

range 

1901 

2060 

2007.346 

4.895 

1.647 

1 

DtLRA 

J 

^  range 

1920 

2028 

1997.207 

7.470 

-1.447 

1 

ACTFEDSVC 

_ 

range 

0.000 

3173.00 

172.226 

293.046 

2.734 

H 

PPSC 

range 

mill 

444444 

115714.5 

27519.048 

6.699 

ffl 

PHCC 

B 

set 

-- 

19 

1 

APFTIND 

■  H 

! 

set 

-- 

4 

1 

DEPL 

i —  -L-  1  -^1 

set 

-- 

48 

1 

MILEDCOMP 

L.Jl. 

set 

-- 

37 

! 

CIED 

set 

-- 

29 

CVEL 

nBnH n 

set 

-- 

28 

ffl 

DtENTRY 

range 

0 

2006 

1994.646 

22.522 

-74.386 

38 


I 

3 

3 

3 

i 

i 

3 

i 


DtENTRES 


range 


2006 


1971.179 


223.684 


-8.687 


AFSG 


hJL 


i 


set 


15 


AFQT 


Jffll 


range 


0.000 


99.000 


55.636 


24.233 


-0.340 


DtETS 


range 


2080  2007.965 


23.870 


-80.367 


NEXE 


range 


1.361 


0.803 


3.529 


CMXT 


range 


48 


15.183 


10.874 


1.265 


l[fl„ JJ„  tfi  H 


SKLVL 


J  L 


set 


149 


GOOD  YRSVC 


range 


0.000 


40.000 


5.353 


7.241 


1.641 


UCAG 


i 


set 


72 


FCPSCD 


IB  „  H 


set 


SOPTDD 


LH. 


set 


MILSPI 


set 


DtADTE 


range 


199 


2006 


1997.714 


6.224 


-70.323 


DMOSQ 


h EL 


set 


TIER 


set 


13 


PRIX 


I  i 


range 


1.627 


1.269 


0.978 


39 


40 


41 


I 

D 

E 
1 


RRC 

-.lH. 

11  n  -  [ 

set 

-- 

-- 

-- 

-- 

-- 

12 

CMF 

fcikLLj 

set 

-- 

-- 

-- 

-- 

-- 

76 

Destination 

n 

n 

1=1 

set 

-- 

-- 

-- 

-- 

-- 

4 

$C-Destination 

= 

set 

-- 

-- 

-- 

-- 

-- 

4 
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